DOCUMENT RESUME 



ED 262 079 



TM 850 560 



AUTHOR 
TITLE 



PUB DATE 
NOTE 



PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



Cason, Gerald J.; Cason, Carolyn L. 

A Regression Solution to Cason and Cason*s Model of 

Clinical Performance Rating: Easier , Cheaper , 

Faster. 

85 

10p»; Paper presented at the Annual Meeting of the 
American Educational Research Association (69th, 
Chicago, IL, March 31-April 4, 1985). 
Speeches/Conference Papers (150) — Reports - 
Research/Technical ( 143 ) 

MFOl/PCOl Plus Postage. 

*Clinical Experience ; Computers ; Efficiency; 
*Estimation (Mathematics); Goodness of Fit; Higher 
Education; *Interrater Reliability; Mathematical 
Models; *Medical Students; *Regression (Statistics); 
Statistical Studies; *Student Evaluation; Validity 



ABSTRACT 

A more familiar and efficient method for estimating 
the parameters of Cason and Cason 's model was examined. Using a 
two-step analysis based on linear regression, rather than the direct 
search interative procedure, gave about equally good results while 
providing a 33 to 1 computer processing time advantage, across 14 
cohorts of junior medical students rated by residents and attending 
physicians in a medicine clerkship. First, regression analysis of 
2-transf ormed ratings produced regression weights (b's) associated 
with each rater and subject. Then, the unit vector and person b's 
were converted into inter-person distances, then person locations on 
the underlying stringency and ability scale . Essentially equally good 
fit with the data was achieved by the new method in 12 of the 14 data 
sets. In the other two, fit was still quite good. Correlation between 
parameter values estimated by the^ two methods was very high in groups 
where equal fit was achieved. In the other two, moderately high 
correlations were observed. The new method provided equally good 
improvement in reliability and convergent validity of adjusted 
scores. The improved economy and ease of application of the new 
method further expanded the advantages of using the Casons' model to 
statistically control rater bias. (Author/GDC) 
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Using a two step analysis based on linear regression^ tather than the less 
familiar direct search iterative procedure used in previous research^ gave about 
equally good results while providing a 33:1 computer processing time advantage 
to the new- method across 14 cohorts of Junior medic^ students rated ty 
residents and attending jiiysicians in a Medicine Clerkship, In the first step^ 
regression analysis of z-transformed ratings produced regression weights (b's) 
associated.with each rater and subject. In the secpnd step^ the unit vector and 
person b's were converted into theoretical inter-person distances^ then person^ 
locations (RRPs and SAPs) on , the underlying stringency and ability scale. 
Essentially equally good fit with the data was achieved by the new, faster 
method in 12 of the 14 data sets (•82<R<,95; p<,001) • Even in the 2 cohorts in 
which the new faster method did not find quite as good fitting parameter values^ 
the fit was still quite good (•79<R<,80; p<.005) • \ Correlation between parameter 
values estimated by the two methods was very high \n groups where equal fit with 
the data was achieved(,88<r<l,00) • Even in the bther two, moderately high 
oorcelations were _ observed between parameter values estiitated by the two 
appcoiaches (•84<r<,91), In spite of seme differences in fit with data, the new 
methodVprovided equally goodf iirprcvement in reliablity and convergent validity 
of adjusted scores, Ihe^.QSi^ased econony and ease of application of the new 
method '^urther e^^sSi^dthe advantages of using the Casons' model to 
statisti-^ly oonjiif^l rater bias as either an adjunct to or in the absence of 
adequate ^^xeet control methods such as rater training to irrprove the 
reliability and validity of clinical performanoe measures. 



-PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC). 



Presented at the Annual Meeting of the American 
Association^ Chicago r i^ilr 1985. 

Reprints available from: 

Gerald J. Cason^ Vh.D. 

Office of Educational Developnent 595 

Universii^ of Arkansas, for Medical Sciences 

4301 Wfest Markham 

Little Rock, Arkansas 72205 



Educational Research 



u s DEPAHTMENT OF EDUCATION 

NATIONAL INSTITUTE OF EDUCATION 

EDUCATIONAL RESOURCES INFORMATION 

CENTER lERtC) 
^/This document has been reproduceci as 

received ffom the person or uiyanuation 

originating it 

Minor changes have been made to improve 
reproduction quality 

• Points of view or opinions statesd m this docu 
me/it do not necessarity represent oHicial NIE 
position or polity 



A Regression Solutiorvi:o Cason and Cason's Model of Clinical Performance Rating: 

Easier^ Cheaper^ Easter 

Gerald J. Cason and Carolyn L. Cason 
University of Arkansas for Medical Sciences 



This paper presents a more fapiiliar and eoononical methoc}^ based on lixiear 
regression^ for estimating the parameters of Cason and Cason' s (1984) model. 
This iitproved parameter estimation approach will pcauote research on the Casons' 
theory of performance rating and ease its application to the problem of 
separating rater bias fran trie performance in practical performance assessment. 

^flieory: 

Cason and Cason's simplified model of their performance rating theory accounts 
for all systematic variation in performance ratings' exclusively fcy variation in 
rater stringency and subject (e.g. ^ student) ability. Cason and Cason (1984; G. 
Cason et al^ 1983; C. Cason et al^ 1983) ha^e presented evidence that the model 
fits clinical performance data in a common type of health profession, education 
setting^ i.e. ^ vrtiere each student is rated by several but not all raters and 
each rater rates several but not all students. Where sufficient overlap in .who 
rated whom occurred^ variation in rater stringency could be statistically 
controlled. This led to iirproved reliability and validity of the adjusted 
performance ratings. These results were obtained at two different schools^ with 
different rating inventories^ different amounts of rater training^ and at 
different levels of trait specificity. However^ all prior research was based on 
estimating rater stringencies and subject abilities using program ^ERLIN in 
conjunction with program STEPIT (Chandler, 1965) . MERLIN used SIEPIT to find 
the best values for subject ability points (SAPs) and rater reference points 
(RRPs) in the sense of producing the least-squares fit with the observed data. 
STEPIT finds local minima of continuous real functions by cyclic reLaxaticn and 
parabolic interpolation (a variation of direct search) . SIEPIT was developed 
for quantum cheniistry research and was both relatively slow in this application 
(therefore e:^nsive) and unfamiliar to most educational researchers. It did 
hasje the initial advantage of being easy to apply to the Casons' model. 

In the Casons' itodel the e3?)ected subject ratine^ (ESR) , measured as a percent of 
the maxirrtim rating, is a function of. the difference, z, between the rater's 
stringency (i.e. r value associated with the Rater Reference Point or RRP) and 
the subject's ability (i.e. , value associated with the Subject Ability Point or 
SAP). In previous research this relationship was modified by an arbitrary 
scaling factor (SF=100) . 

2 = (SAP - RRP)/SF 

The theoretically postulated curvilinear relationship between z and the e:q)ected 
subject rating (ESR) has been stipulated as the unit-normal ogive. Thus, the 
ESR (in percent) for a given z is equal to the proportion of area under the 
normal curve below z; that is, p(z) times 100: 
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ESR = p(z) X 100 

Estination of the model parameters (RRPs and SAPs) was accomplished in two 
steps. Firsts the observed ratings were' transformed to z's using an inverse 
normal probability function. These z's were used as the criterion values (Y 
vector) in a regression model of the form: 

Y = CU + bj^Rl + b^i? +...+ b^R^ + VlS^^ + ^2^^^ ^---^ '^n+kS'^'' 

where: ^ ^ 

U.is a • "unit vector" containing a 1 for each observation in Y; 
* R^ (i=l through n) is a vector containing a 1 if the observation in Y 
. pertains to a rating given by rater i^ zero otherwise; 
s3 (j=nfl through n+k) is a vector containing a 1 if the observation 

. in Y is associated with subject zero otherwise; and^ 
c and b, through bu . u are the regression weights that minimize the 
squares of the Vltues in the error vector (E) . 

A special -^"purpose program^ GBNVBCr was used to generate the above model fron 
input which specified ID numbers for rater-subject, pairs and associated observed 
ratings. Program LMS (linear model solver) , based on Ward and Jennings' (1973) 
program MDDEL^ provided a regression analysis of the model generated by GENVBC. 
The regression analysis carried out by LMS provided the regression weights (b's 
not Beta's) for each vector in the model produced by (3ENVBC. Note however that 
the regression program must allow for models with redundant predictor vectors 
(e.g. r program M3EEL in Ward & Jennings^ 1973) . The z's in the criterion vector 
correspond to observed distances (containing error) between raters and subjects 
on the underlying stringency-ability scale. Pairs of b's and the unit vector 
weight give the theoretical r "error-free" distance between a rater-subject pair: 

RXrOS(I) = BOES(I) - BOERX + CDNST 

where: 

RXrOS(I) is the distance from a rater (RX) to subject I; 
BOFS(I) is the regression weight (b) of subject I; 
BOFBX is the regression weight (b) of an arbitrarily chosen rater 
(RX) ; and, 

ODNST is the regression weight (c) for the unit vector (U) . 

The second step was to convert the regression weights into theoretical distances 
and then into rater and subject locations on the stringency and ability scale. 
One rater's RRP (i.e. , rater with the most ratings and lowest ID nunber) was 
arbitrarily chosen asi the anchor location for the scale and this point was 
assigned an arbitrary value (i.e. , 500). Once one rater's position (i.e., 
stringency) was defined, all subjects could be located with respect to that 
rater by the linear equation: 

SLOC(I) = mCJPL + RXTOS(I) 

where: 

SLCXId) is the location (SAP) of subject (I); and, 

mOJPL is the arbitrary value used for anchoring one rater's RRP. 
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As soon als all subjects were located, an analogous set of equations^ could be 
directly solved to obtain the remaining rater locations (RRPs) . Program LOCATE 
was used to solve these equations and obtain the theoreti^cal r "error-free" 
distanoesr and then the RRPs and SAPs. 

Jointly GENVECr LMS, and LOCATE r^aced SIEPIT as the method of finding the 
RRPs arid SAPs, All of these programs were run on a OEC-IO System computer. For 
sinplicit^ in reporting, MERLIN-STEPIT refers to the earlier method and 
MERLIN-UB refers to the current regression based method of finding RRPs and 
SAPs. . Uie new methodr MERLIN-UBr described above was applied to 14 data sets 
pce/iously analyzed by MERLIN-STEPIT (G. Cason et air 1983) . The data were 
overall ratings of the clinical performance of Junior year medical students in a 
Medicine Qerkship (i.e. r clinically oriented course) during 1981 and 1982 at a 
school located in the Southwest U.S. A. See G. Cason, Cason, & Littlefield 
(1983) for a more compiete description of the data source. One rater's data /ID 
3027) vrtiich was included in the earlier analyses was excluded here because the 
ratings were out of valid range. Qnnmission of this rater ronoved two ratings 
from data set 1981F. . 

Results 

As can be seen from an examination of Table 1, in 12 out of the 14 data setsr 
MERLIN-U6 achieved essentially as good fit with the data as did MERLIN-STEPIT. 
In 2 cohorts^ 1982 E and Fr MERLIN-UG did not achieve as good a fit. However 
even in these cases, MERLIN-UB's fit with the data was still quite good 
(R>.79;p<.005) . The R values reported in Table 1 for MERLIN-STEPIT' s fit are 
not the same as those provided in the earlier report (G. Cason, Cason, & 
Littlefield, 1983; Table 1) . Prior to January 1984, MERLIN. contained a jxrogram 
coding error which substituted MSQ for SS in computing the RSQ. This produced 
spuriously low reported values. 

As can be seen from Table 1, in each cohort MERLIN-LMS solved the equations for 
the parameters in only a small fraction of the computer time required by 
MERLIN-STEPIT. In every case, MERLIN-UB solved in under 2 minutes. In no case 
did MERLIN-STEPIT solve in less than 10 minutes. In total, across. all 14 data 
sets, the original approach required 528 minutes while MERLIN-UB required only 
15. Therefore, MERLIN-IWS had a machine processing time advantage bf 
approximateiy 33:1 while achieving good to' excellent fit with the data. 

Table 2 provides standard dsviatibhs for the parameters of Cason and Cason' s 
model found by MEBLIN-LMS and MERLIN-SIEPIT on each of the 14 data sets. In 10 
of the 14 data sets, variation in rater standards (i.e. , RRPs) was greater than 
variation in student ability (SAPs). In general, MERLIN-LMB produced estimates 
of the parameters with slightly higher variability than did MERLIN-STEPIT. A 
notable exception to this pattern was cohort 1981F. The largest variability of 
parameters, regardless of solution approach used, was. found in this cohort: 
standard deviations twice those found, in other cohorts. Ohere was nothing 
obvious about the pattern of variability of parameters that provided ary basis 
for speculating on why the fit of MEKilN-UC's parameters were not quite as good 
in 1982 E and F as were MiSRLIN-STEPIT's. 
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•fahle 1 . Pit of Cason and Cason' s Model and Time to Solve 
for eadi Cohort' s Data Using MERLIN-LMS and MERLIN-fflEPIT 

Year: 1981 

. Cohort 

MultipLe 
MERLIN-UB 
^ERLIN-STEPI' 



B 



.93 .89 .90 .88 .87 .95 .90 
.93 .90 .90 .89 .88 .96 .90 



Minutes to Solve 



MERLIN-LMS 


.75 


.59 


1.18 


1.21 


.84 


.95 


1.07 


^ MERLIN-STEPITr 


20.00 


17.00 


49.00 


38.00 


10.00 


35.00 


23.00 


Mimt^r of raters 


37 


28 


46 


38 


29 


29 


31 


Number o£ students 


21 


20 


29 


24 


17 


18 


20 


Total ratings 


108 


76 


126 


102 


73 


72 


91 






Year; 


1982 


















Cohort 










A 


B 


C 


D 


E 


F 


G 


MultipLe r'- 














.82 


MERLIN-UB ^ 


.85 


.88 


.89 


.90 


.79 


.80 


MESLIN-STEPITT 


.86 


.89 


.90 


.91 


.89 


.89 


.87 


Minutes to solve 
















MERLIN-UB ^ 


1.36 


1.93 


.77 


.95 


1.20 


1.12 


1.12 


MERLIN-STEPIir 


53.00 


68.00 


28-.00 


53.00 


49.00 


49.00 


36.00 


Number Of raters . 


42 


42 


34 


39 


43 


39 


40 


Number o£ students 


24 


24 


16 


21 


27 


25 


27 


Tdtal Ratings 


129 


129 


80 


114 


145 


131 


133 



JaII R's significant at p<.005. 

'Valiies in earlier report (G. Cason, Cason, & Littlefieldr 1983; 
Table 1) were spuriously low due to cx)ding error in MERLIN whidi 
substituted MSQ for SS in computing R3Q. 

^Times for previous analyses available onlyjto nearest minute. 
Both ratings 6f rater 3027 includBd in previous analysesr excludsd- — 
in the present stud^ because th^ were out of valid range. 
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Table 2. Standard Deviations of Parameters as Pound ty 
MERLIN-I/B and MEIRLIN-STEPnr 



Year: 1981 



B 



Cohort 
D 



E 



Parameter 
RRP 

MERLIN-Ue 
MERLIN-STEPIT 



34.48 36.66 37.60 32.43 
34.02 33.68 33.37 30.83 



28.83 63.69 29.20 
28.29 92.28 28.25 



SAP 

MERLDJ-LMS . 28.85 

MEPLDJ-SrEPIT 2a. 93 



41.75 28.55 23.10 21.09 36.17 23.33 
38.15 26.72 21.28 21.45 49.74* 22.93^ 



All 

MERLIN-UB 
MERLIN-STEPIT 



36.60 43.21 41.33 34.44 30.67 65.43 38.57 
36.24 40.01 38.14 32.82 30.49 86.86 37.78 



Year: 1982 



B 



Cohort 
D 



E 



RRP 

MERLIN-UB 
MEFLIN-SrrEPIT 



44.03 34.95 29.27 39.34 41.72 50.59 39.15 
39.41 33.42 27.89 38.04 31.53 31.32 24.37 



SAP 

MERLIN-LMS 39.96 
MERLIN- STTEPIT 33.97 



35.44 38.34 27.88 28.63 52.69 40.96 
33.49 36.99 26.06 20.01 37.33 26.91 



All 

MERLIN-U© 
MERLIN-STEPIT 



47.08 40.12 38.39 41.17 41.80 
42.27 38.48 37.09 39.66 33.06 



62.54 44.46 
47.24 31.19 



Values for MERLIN-STEPIT obtained fron analyses in pcwious study 
but not ^included in original report of that stuc^ (Cason, Cason, & 
Littlefield, 1983). 
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Table 3. Correlation between Parameters as Pound by 
MERLIN-I/B and MERLIN-fflEPIT 

\ear: 1981 

Cohort 





A 


B 


C 


D 


E 


F 


G 


Parameter 
















RRP 


1.00 


.99 


.99 


.98 


.97 


.99 


1.00 


SAP 


1.00 


.99 


.97 


.96 


.96 


.99 


.99 


ALL 


1.00 


.99 


.99 


.98 


.98 


.99 


1.00 




Year: 1982 


















Cohort 










A 


B 


C 


D 


E 


F 


G 


RRP 


.97 


.99 


.98 


.98 


.87 


.84 


.90 


SAP 


.96 


.99 


.97 


.97 


.91 


.87 


.88 


ALL 


.97 


.99 


.99 


.99 


.90 


.90 


.91 



\ 



Table 4. Single Rater Reliabilities and Validities of 
Observed and Adjusted Ratings ^5 

Year: 1981 











Cohort • 










A 


B 


C 


D 


E 


F 


G 


Reliability; Validity 










.22 


.16 


.23 


Cteerved Ratings 


.30 


.31 


.24 


.22 


Adjusted Ratings 










.84 


1.00 


.98 


MERLIN-UB , 
MERLIN-STEPITT ^ 
Rating per student 


.87 


.61 


.80 


.87 


.87 


.61 


.80 


.87 


.84 


1.00 


.96 


5.13 


3.79 


4.33 


4.24 


4.28 


3.97 


4.53 






Year: 1982 
















Cohort 










A 


B 


C 


D 


E 


F 


G 


Reliability; Validity 














.33 


Observed. Ratings 


.08 


.24 


.40 


.18 


.10 


.35 


Adjusted Ratings 










.58 


.76 


.88 


MERLIN-DB , 


.37 


.65 


.76 


.89 


MERLIN-STEPI'T- 


.37 


.65 


.76 


.89 


.58 


.76 


.88 


Ratings per student^ 


5.37 


5.37 


4.99 


5.42 


5.37 


5.23 


4.92 


^values reported 


previously 


(Cason, 


Cason, 


& Littlefield, 



Table 3) . 

The geanetric mean number of ratings per student (k) is cssed in the 
Spearman- Brown ejipansion formula to determine the reliability of the 
average of k independent' ratings. 
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Table 3 provides correlations between the parameters estimated by the two 
methods for each of the 14 cohorts. For RRPsr these ranged f rooa .84 to 1.00 
with a mean of .96; for SAPs f ran .87 to 1.00 with a mean of .95; and^ for all 
parameters considered together .90 to 1.00 with a mean of .97. In most of the 
cohorts the parameter estimates were essentially the same. As e:5)ected in the 
two cohorts . in which the Cason and Cason model fit the data least well^ 1982 E 
and F, the correspondence between parameters estimated by the two methods was 
lowest. However, even in these two cases the correlations between the parameter 
estimates were moderately high. 

Single rater reliabilities and validities for observed ratings and adjusted 
ratings for MERLIN-I/B and MERLIN- SEEPIT are given in Table 4, The two 
procedures' (WS and SIEPIT) resulted in the same estimated reliabilities^ even 
in those cohorts where the fit achieved by LMS was somewhat less than that of 
STEPIT. I^ese reliabilites are intra-class correlations cOTputed by a variation 
of the analysis of variance procedure recommended by Ebel (1951) . Details of 
our procedure were provided in a previous r^ort (Cason/ Cason, & Littlefield, 
1983) . Sing]e rater reliabilities may be understood as either the reliability 
associated with one rater's rating of a student; or^ the average inter-rater 
correlation e:?)ected f ran randomly chosen pairs of raters. Because students 
were rated by multiple ratersr the overall reliability of ratings in a group is 
estimated by using -the geometric mean nunber of ratings per student (k) in the 
Spearman-Brown es^nsion formula. As we ha/e discussed elsewhere (Cason & 
Cason, 1984) , the single rater reliabilities may be equally well interpreted as 
convergent validity indexes. As the conventional validity expansion formula 
showsr k's effect on iiiproving validity is much less pronounoed than on 
inproving reliability. 

Importanoe 

Unlike separately z- transforming each rater's ratings (Ebel, 1951) or 
handicapping as recommended by Littlefield et al (1984) f neither of which 
includes a test of the assunptions used to justify these methods of removing 
rater bias f ran ratings, the Casons' model provides direct means for testing its 
general assumptions (i.e., fit with data) and contrasting it with the most 
common alternative model. The regression based method of solving for the 
parameters of the Casons' model make it more accessible ancj econonical to 
conduct research on their theory and to apply the technology in practical 
settings to achieve more nearly reliable and valid performance mecisures. ' The 
increased econcniy of this method over the earlier one further e^^ands the cost 
advantage of statistical control of rater bias when. ccxnpa red with direct control 
methods such as rater training. The greater familiarity of regression methods 
makes this approach easier to understand and use by a majority of educational 
researchers. The greater speedy ease of useV and thus econoomy of this approach 
are achieved at practically no cost in accuracy of solution. The regression 
approach provides adjusted scores whose reliability and validity are improved to 
the same (large) degree that the earlier^ more cumberscme approach attained. 
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